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\ Abstract In this paper we offer a complete methodology for sufficient dimension reduction called the 

, test function (TF). TP provides a new family of methods for the estimation of the central subspace 

5IJ I (CS) based on the introduction of a nonlinear transformation of the response. Theoretical background 

^ i of TF is developed under weaker conditions than the existing methods. By considering order 1 and 2 

conditional moments of the predictor given the response, we divide TF in two classes. In each class we 
provide conditions that guarantee an exhaustive estimation of the CS. Besides, the optimal members 
are calculated via the minimization of the asymptotic mean squared error deriving from the distance 
^ . between the CS and its estimate. This leads us to two plug-in methods which are evaluated with several 

I simulations. 

^ ■ AMS 2000 subject classifications: Primary 62G08; secondary 62H11, 62H05. 
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1. Introduction 

^ ■ Dimension reduction in regression aims at improving poor convergence rates derived from the 

\^ . nonparametric estimation of the regression function in large dimension. It attempts to provide 

10 ' methods that challenge the curse of dimensionality by reducing the number of predictors. A 

■ specific dimension reduction framework, called the sufficient dimension reduction (SDR) has 
drawn attention in the last few years. Let y be a random variable and X a p-dimensional 

■ random vector. To reduce the number of predictors, it is proposed to replace X = {Xi, Xp)^ 



o 



X 



by a number smaller than p of linear combinations of the predictors. The new covariate vector 
has the form PX, where P can be chosen as an orthogonal projection on a subspace E of M^. 
Clearly, this kind of methods relies on an alchemy between the dimension of E, which needs to 
be as small as possible, and the preservation of the information carried by X about Y through 
the projection on E. In the SDR literature, mainl y two kind of spaces have been studied. First 



. a dimension reduction subspace (DRS) [ij (|l99ll )] is defined by the conditional independence 

property 

(1.1) Y ±X \ PcX, 

where Pc is the orthogonal projection on a DRS. With words, it means that knowing PcX, there 
is no more information carried by X about Y. It is possible to show that (jl.ip is equivalent to 

(1.2) P(y G A\X) = F{Y E A\PcX), 

for any measurable set A. Moreover under some additional conditions Cookl ( 19981 )]. the inter- 



section of all the DRS is itself a DRS. Consequently, there exists a unique DRS with minimal 
dimension and we call it the central subspace (CS). In this article the CS is noted Ec- Sec- 
ondly, another spac e called a mean dimension reduction subspace (MDRS) has been defined in 
ICook and Lil (|2002l l as 



(1.3) E[Y\X]=E[Y\P^X], 



where Pm is the orthogonal projection on a MDRS. Clearly, the existence of a MDRS requires a 
weaker assumption than the existence of a DRS and therefore it seems to be more appropriate 
to the context of regression. Because of the analogous equation of (jl.Sp . 



(1.4) 



Y X E[Y\X] I PmX, 



the definition of a MDRS imposes that all the dependence between Y and its regression function 
on X is carried by PmX- If the in tersection of all the MDRS is itself a MDRS, then it is called 
the central mean subspace (CMS) jCook and Lil (|2nn2l l]. In the fohowing the CMS is noted Em- 
Finally, notice that because a DRS is a MDRS, the CS contains the CMS. 
There exists many methods for estimating the CS and the CMS and these methods can be 
divided into two groups, those who require some assumptions on the distribution of the co- 
yariates and those who do not. The second grou p includes structure adaptive method (SAM) 
Hristache. Juditskv. Polzehl. and Spokoinv ( 200ll )]. minimum average variance estimation (MAVE) 



Xia. Tone. Li. and Zhul (120021')]. and stru ctural adaptation via maximum minimization (SAMM) 



Dalalvan. Juditskv. and Spokoinv ( 20081 )]. Those methods are free from conditions on the pre- 
dictors but require a non parametric estimation of the regression function £[^1^ = x]. In 



this article we are concerned only with methods of the first group and we quote them in the 
following. 



To be more comprehensive, from now on we work in term of standardized covariate Z = 
S~2 (X — E[X]) with S = var(X). Hence we define the standardized CS as T,^Ec- Since there 
is no ambiguity, we still note it Ec and we still denote by Pc the orthogonal projection on it. 
For any matrix M, we note span(M) the space generated by the columns of M. 

All the methods of the first group derive from the principle of inverse regression : instead of 
studying the regression curve which implies high dimensional estimation problems, the study is 
based on the inverse regression curve E[Z|y = y] or the inverse variance curve var(Z|y = y). 
To infer about the CS, order 1 moment based methods require that 

Assumption 1. (Lineariy condition) 

QcE,[Z\PcZ] = a.s., 

where Qc = I — Pc- Under the linearity condition and the existence of the CS, it follows that 
E[Z|y] G Ec a.s. and then if we divide the range of Y into H slices I{h), we have for every h, 



(1.5) 



mh = E[Z\Y G I{h)] G E^ 



and clearly, the space generated by some es timat ors of the m^^s estimate the CS, or a subspace 
of it. To obtain a basis of this subspace, llj (jl99ll ) proposed a principal component analysis and 
this leads to an eigendecomposition of the matrix 



(1.6) 



MsiR = J2 



Phmhml, 



where ph = ^{Y G I{h)). Ma ,ny ra ethods relying on the inverse regression curve such as sliced 
inverse regression (SIR) [l| ( 199 ll )] have been developed. Other w ays to estimate the in verse 
regression curve are investigated in k ernel i nvers e regression (KIR) |Zhu and Fang ( 19961 )] and 
parametric inverse regression (PIR) [Bural (|l997l l]. Instead of a principal component analysis, 
the minimizat i on of a discrepancy function is studied in inverse regression estimator (IRE) 
to obtain a basis of the CS. For a complete background about order 1 
Cook and Nil (|200,5l ). 



Cook and Nil (|200,5l ^ 



methods, we refer to 



Otherwise, in addition to the linearity condition order 2 moments based methods require that 



Assumption 2. (Constant conditional variance (CCV)) 



Yaic{Z\PcZ) = Qc a.s., 
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then under the hnearity condition, CCV and the existence of the CS, it fohows that spa.n{var{Z\Y)- 
I) € Ec a.s. and by considering a shcing of the response, we have 

(1.7) span(f;i - I) C. Ec, 

where Vh = var(Z|y G Since the spaces generated by the matrices {vh — I) 's ar e included 

in the CS, sliced average variance estimation (SAVE) in Cook and Weisberg ( 199ll ) proposed 
to make an eigendecomposition of the matrix 

Ms AVE = ^Ph{vh-lf, 
h 

to derive a basis of the CS. Another combination of matr ices b ased on the inverse variance curve 
is develope d in sliced inverse regressio n- II (SIR-II) (|l99lh ]. More recently, c ontour reqres- 
sion (CR) |Li. Zha. and Chiaromont!3 ^200^ )]. and directional regression (DR) |Li and Wand 



(Hqo^)] investigate a new kind of estimator based on empirical directions. Besides, methods for 



estimating the C MS also require Assumptions 1 and 2. They incl ude principal Hessi an direction 
(pHd) |Lil ( 199^ )]. and iterative Hessian transformation (IHT) [Cook and Lil ( 2002 )]. To clear 



the failure of certain methods when facing pathological models and keep their efficiency in other 
cases, some combinations of the previous methods a.s SIR and SIR-II, SIR and pHd or SIR and 
SAVE have been studied in Gannoun and Saracc and lYe and WeissI (1200,-^ 1. 



As we have just highlighted. Assumptions 1 and 2 are necessary to respectively character- 
ize the CS with the inverse regression curve and the inverse variance curve. A first point is 
that the linearity condition and CCV assumed together are really close to an assumption of 
normality on the predictors. Moreover for each quoted method, these assumptions guarantee 
only that the estimated CS is included asymptotically in the true CS. A crucial point in SDR 
literature and a recent new challenge is to propose some methods that allow an exhaustive esti- 
mation of the CS under mi l d con ditio ns. Some recent rese arch are concerned with this problem, 
Li. Zha. and Chiaromontj ( 2005 ) and Li and Wang ( 200?! ) proposed a new kind of assumptions 



that guarantee the exhaustivity. 

There exists a large range of methods aiming at the estimation of the CS. In this paper, we 
try to propose a general point of view about SDR by introducing the test function method (TF). 
The original basic idea of TF is to investigate the dependence between Z and Y by introducing 
nonlinear transformations of Y, and inferring about the CS through their covariances with Z 
or ZZ'^ . Actually, an important difference between TF and other methods is that neither the 
inverse regression curve and nor the inverse variance curve are estimated as it is suggested by 
equation (jl.Sp and (jl.7p . In this paper, these two curves are a working tool but the inference 
about the CS is obtained through some covariances. More precisely, the CS is obtained either 
by inspection of the range of 

E[zv(y)], 

when ip varies in a well chosen finite family of function or either by an eigendecomposition of 

E[ZZ^i;iY)], 

where is a well chosen function. Hence two kind of methods can be distinguished, the order 1 
test function methods (TFl) and the order 2 test function methods (TF2). Notice that Msir 
IS an estimate of E[ZE[Z|y]^], hence SIR may be seen as a particular case of TFl. In this 
paper, we show that TF allows to relax some hypotheses commonly assumed in the literature, 
especially we alleviate the CCV hypothesis for TF2. Moreover for each methods, we provide mild 
conditions ensuring an exhaustive characterization of the CS. Finally, an asymptotic variance 
analysis leads us to the optimal transformation of Y for the estimation of the CS. As a result a 
significant improvement in accuracy is targeted by TF. The present work is divided in the three 
following principal parts : 

• Existence of the CS 

• Exhaustivity of TF 

• Optimality for TF 



More precisely, it is organized as follows. In section [2l we investigate some new conditions 
ensuring the existence of the CS and the CMS. In section [3l we introduce TFl and TF2 by 
providing some basic results. Conditions for an exhaustive characterization of the CS are 
presented in section SI The choice of the optimal transformation of the response for TFl and 
TF2 is detailed in section [5j Accordingly, we propose two plug-in methods deriving from the 
minimization of the MSE. And finally, in section [6] we compare both methods to existing ones 
through simulations. 

2. Unicity of the central subspace and the central mean subspace 

Conditions on the unicity of subspaces that allow a dimension reduction are investigated in 
this section. This problem has drawn the attention early in the literature but it seems not to 
be the case anymore. As a consequence of the definition of the CS (resp. CMS) , its existenc e 
is equivalent to the unicity of a DRS (resp. MDRS) with minimal dimension. In lCook ( 19981 ). 



proposition 6.4 p. 108, it is shown that the existence of the CS can be obtained by constraining 
the distribution of X. More pr ecisely, the CS exist s under the assumption that X has a convex 
density support. Moreover, in Cook and Li ( 20021 ). the existence of the CMS is ensured under 



the same condition than the CS. We prove in Theorem 12.21 and Corollary 12.31 below that the 
convexity assumption can be significantly weakened. Here, the standardization of the predictors 
do not change the presentation of our results, hence we present it for X. For a comprehensive 
proof of our theorems we need the following lemma. 

Lemma 2.1. If the restriction of X to the hall ofW with radius r and center xq has a strictly 
positive density, then the intersection of all the MDRS is a MDRS on this hall, i.e. 

{nY\x\-ny\Rx])t{x^BM} = 

where R denotes the orthogonal projection onto the intersection of all MDRS. 

Proof. It suffices to show the theorem for two MDRS. We first make the proof for a ball centered 
in 0, and then we apply it to X — xq. Let E and E' be two MDRS and P and P' their respective 
orthogonal projections. Denote by R the orthogonal projection onto the subspace E[\E' . Using 
the definition of a MDRS, 

E[y|x] = E[y|px] = E[y|p'x] a.s.. 

Let g{PX) and h{P'X) denote the last two functions of the preceding equation. Using that X 
has a strictly positive density on the unit sphere, we can write 

(2.1) g{Px) = h{P'x) a.e. on 5(0,r). 

Let e > 0, and ipk be a unit approximation with compact support i?(0, e), we define the function 
fk : S(0,r) ^ M such that 

fk{x) = {go P)*ipk (x). 

Then, we have for all x, 



fk{x) = j g{P{x-y))ipk{y)dy 
= fkiPx). 

Moreover, for all x G B{0, r — e) since in the above integral x — y£ B{0, r), using ()2.ip we derive 

fk{x) = {hoP')*^i, {x), 

and similarly we obtain fkix) = fk{P'x). Since fk{x) = fk{Px) = fk{P'x), a simple iteration 
process provides for all x G 5(0, r — e), 

fkix) = fkiiPP'Tx). 
Since fk is a continuous function and (PP')"^ — R, 

n— >oo 

fkix) = fkiRx), xG5(0,r-e). 
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To conclude, the unit approximation theorem gives us the convergence 



fkoR^goP. 

Thus, from fk{RX) we can derive a subsequence fnf.{RX) that converge almost surely to g{PX), 
proving that E[y|X] is a function of RX. This completes the first part of the proof. 

Now suppose that X has a strictly positive density onto the ball of radius r and center xq. 
Define X = X — xq, it is clear that a MDRS for X is also a MDRS for X and conversely. Then, 
since X is centered in 0, the intersection of two MDRS is still a MDRS for X and obviously for 
X. □ 

The followin g theorem provides us the existence of the CMS under a weaker condition than 
Cook ( 19981 ). The same result on the existence of the CS is presented in a corollary that 



m 

follows the theorem. 

Theorem 2.2. If X has a density such that the Lebesgue measure of the boundary of its support 
is equal to 0, then the CMS exists. 

Proof. Denote by F C the support of the density of X. A first step consists in showing that 
its interior F can be covered by a countable number of balls included in F. Secondly, we apply 
Lemma 12.11 to each of this balls to obtain that the intersection of two MDRS on F is a MDRS 
on F. Finally, the unicity is shown. 

Let X £ F, then it exists r > such that B{x, r) C F. It is possible to find a ball with rational 
center and radius included in B(x, r) and containing x. Thus any x of F is contained in a ball 
with rational center and radius included in F. In other words, the set A formed by all the balls 
-B((?, tq) C F with q and tq rational covers F. Therefore, by applying Lemma |2. 11 we have for 
all B{q,ro) £ A, 

\nY\X]-E[Y\RX]\l{xeBi,,ro)}=0, 

Since A is a countable set, 

\E[Y\X]-nY\RX]\l{xeBi,,ro)}=0, 

iq,ro)eA 

then, 

\E[Y\X]-E[Y\RX]\ ^ l{xeiJfero)} = 0. 

By assumption P(X £ F) = 1, then the right-hand side is almost surely strictly positive, and 
thus 

E[Y\X] = E[Y\RX] a.s.. 

Consequently, the intersection of two MDRS is a MDRS. To complete the proof, all the MDRS 
with minimal dimension have the same dimension and their intersection is still a MDRS with 
minimal dimension. Hence a MDRS with minimal dimension is unique and the CS exists. □ 

Corollary 2.3. If X has a density such that the Lebesgue measure of the boundary of its support 
is equal to 0, then the CS exists. 

Proof. Supposed it exist two different DRS with minimal dimension. By equations ()1.2p and 
()1.3p . these DRS are MDRS for the random variables lyeA and X, for any measurable set A. 
Because we can apply Theorem 12.21 it is impossible. 

□ 
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3. Test function methodology and assumptions 



In the previous section, we focused on conditions that guarantee the existence of the CS and 
the CMS under the respective Assumptions (jl.ip and ()1.4p . Since TF is only concerned about 
the CS estimation, we assume from now on that X satisfies the condition of Corohary I2.2i As 
it is detailed in the introduction the estimation of the CS raised two kind of conditions. Those 
that guarantee a characterization of the CS, and those that permit to cover the entire subspace. 
In this section we are concerned about the first one. Moreover, we explain our next results in 
a simple way using the standardized covariates. We denote by d the dimension of Ec- 

3.1. Order 1 test function. Model (jl.ip implies that al l the information detained by Z about 
Y is carried by PcZ. To find as pointed out byQ ( 199ll ) and explained in many articles 



on the subject, a natural idea is to focus on the inverse regression curve E[Z|y]. Actually if 
(jl.ip holds, we can write the inverse regression curve as E[E[Z|Pc-Z^]|^]- Clearly, the linearity 
condition implies that E,[Z\PcZ] E Ec and then E[Z|y] is with probability 1 a vector of Ec- To 
our knowledge, all the order 1 methods target an estimation of the subspace drawn by E[Z|y]. 
As described in the introduction, TFl does not rely on this idea but also requires the linearity 
condition. Let us have few words about this assumption. 

Remark 1. The linearity condition is often equated with an assumption of sphericity on the 
distribution of the predictor. This is well known that if Z is spherical then it satisfies the 
linearity condition bu t the converse is false. Actually, linearity condition and sphericity are not 
so closely related: in \Ea.torl \198A ). it is shown that a random variable Z is spherical if and 



only if E[QZ\PZ] = for every rank 1 projection P and Q = I — P. Clearly, at this stage, 
the sphericity seems to be a too large restriction to obtain the linearity condition. However 
unlike the sphericity, since we do not know Pc the linearity condition could not be checked. An 
assumption closely related to the linearity condition is to ask the distribution of Z to be invariant 

by the orthogonal symmetry to the space Ec, i.e. Z = {2Pc — I)Z . Then for any measurable 
function f , 

E[QcZf{PcZ)] = -E[QcZf{PcZ)], 

which implies the linearity condition. Recalling that sphericity means invariance in distribution 
by every orthogonal transformation, we have just shown that an invariance in distribution by 
a particular one suffices to get the linearity condition. Moreover, the assumption of sphericity 
suffers from the fact that if we add to Z some independent components then, the resulting vector 
is no longer spherical whereas the linearity condition is still satisfied. 

A way to introduce TFl is to consider some relevant facts of the SIR estimation. As explained 
in the introduction, SIR consists in estimating the matrix 

MsiR = K[ZE[Z\Yf] . 

which column space is included in the CS. To make that possible, a slicing approximation of the 
conditional expectation E[Z|y] is conducted and it leads to Msir of equation ()1.6p . Because 
Ph > 0, it is clear that 

(3.1) span(M5/_R) = span {E[Zl{Yei{h)}], h = 1,...,H) , 

and it follows that SIR estimates a subspace spanned by the covariances between Z and a family 
of y-measurable functions. The first goal of TFl is to extend SIR to other families of functions 
by estimating Ec through span {E[Z'ip{Y)], tp G '^h)- Moreover, notice that 

(3.2) Msir = ^\Z (Vi(^), ^p(>^))] , 

where V'fc(y) = f^k,h^{y<^i{h)} ^iid a^^h = lE[Zfc|y E dih)\- It follows from (j3.2p and (j3.ip that 

span (E[2'l{yg7(/j)}], h = \, ...,H) = span{¥\Z^k{Y)\, k = l,...,p), 

and clearly SIR synthesizes the information contains in a subspace generated by H vectors into 
one generated by p vectors. Although each of these spaces are equal, it is not the case for their 

6 



respective estimators. Accordingly, another issue for TFl is to choose the p functions ipk in 
order to minimize the variance of the estimator. 

The following theorem is not really new. Yet, it makes a simple link between TFl and the CS. 
An important fact is that Theorem 13.11 provides a vector in Ec for every measurable function. 

Theorem 3.1. Assume that Z satisfies AssumptionUl and has a finite first moment. Then, for 
every measurable function -0 : M — t- R such that E[Zi/;(y)] < oo, we have 

E[Zi>iY)] G E,. 

Proof Thanks to the existence of the CS, E[ZV'(y)] = E [E[Z|Pc2']'0(>")], and thanks to the 
linearity condition, QcIE[Z'0(y)] =0. □ 

3.2. Order 2 test function. TF2 relies exactly on the same approach than TFl with the 
difference that it involves higher conditional moments of Z knowing Y. Indeed, we are interested 
in the space generated by the columns of the matrix M[ZZ'^ip(Y)] where ip denote a measurable 
function. The same issues are addressed : many functions ip are considered in a first time, and 
then we look for an optimal function. 

Let us recall a known fact often presented as the SIR pathology. Consider the regression 
model 

(3.3) Y = g{Zi,Z2,e), 

where e JL Z £ MP and g is symmetric with respect to its first coordinate. Assume also that 

(Zi, Z2) = {—Zi, Z'l). Then thanks to the linearity condition we have QcIE[ZV'(^)] = whereas 
the previous assumptions clearly imply that E[Zi^/;(y)] = E[— Zi'0(y)]. Therefore for any 
measurable function ■0, we have that E[Z'0(y)] = E[(0, Z2, 0, O)-^V(^)] aiid consequently the 
first direction (1, 0, 0)"^ cannot be reached by any method based on the inverse regression 
curve. Cle arly, TFl is sensitive to the SIR pat hology. Facing this difficulty an idea developed 
first in \A ( 199ll ) and lCook and Weisberg ( 1991 ) is to explore some higher conditional moments 



of Z given Y . Thus methods as SIR-II, SAVE, CR, or DR are interested in some properties of 
the matrix '¥\ZZ'^\Y\. It is also the case for TF2. Nevertheless we do not follow the same path 
specially concerning the assumptions required to explore this second order moment. These kind 
of method assume first that Z has an elliptical distribution or at least satisfies the linearity 
condition, and secondly that vaT{Z\PcZ) is a constant, i.e. CCV. The following proposition 
shows how strong are the last two assumptions. 

Proposition 3.2. Let Z be a random vector of MP (p>2) with a finite second order moment. 
If Z is spherical and if Yax{Z\P Z) = const, for some orthogonal projection P , then Z is normal 
and conversely. 

Proof. This proposition follows from Theorem 4.1.4, p. 48 of Brvd ( 19951 ). □ 



Accordingly, assumptions required for order 2 methods are realy close to the assumption of 
normality on the distribution of the predictors. TF2 works under weaker conditions. Actually, 
the CCV condition is no longer needed and we substitute it by the following assumption. 

Assumption 3. (Diagonal conditional variance (DCV)) 

vai{Z\PcZ) = XIjQc a.s., 

with A* a real random variable. 

In Remark [2] we attempt to compare CCV and DCV. To facilitate futures proofs and for a 
better understanding of such a condition we provide an equivalent form in the following lemma. 

Lemma 3.3. Assume that Z has a finite second moment. Then the following assertions are 
equivalent, 

(1) for any orthogonal transformation H such that HPc = Pc, we have 

\aT{Z\PcZ) = \&t{HZ\PcZ), 
7 



(2) var{Z\PcZ) = A* Qc with A* a real random variable. 
Moreover, under the linearity condition necessarily A* = ^\\QcZ\\'^\PcZ~^ . 

Proof. Let us begin by the easiest way : ([2]) =^ dl])- Let H be any orthonormal matrix as 
described in ([1]). Because HQcH^ = I — HPcH^ = Qci by multiplying ^ on the left side by 
H and on the right side by H^, we find that 

var{HZ\PcZ) = A* = var(Z|Pc^). 

The other way is based on a good choice of the matrix H. Let 7 be a unit vector of E^, and 
define H = I — 277^. Clearly, H is symmetric and satisfies to the requirement of ([T]). So that, 
we have the equation 

vav{Z\P,Z) = {I- 277^) var(Z|P,Z)(/ - 277^), 
developing the right hand side, it follows that 

var(Z|Pc^)77^ = 2 var(7^Z|Pc^)77^ - 77^ var(Z|Pc^), 
and finally, multiplying by 7 on the right, we find 
(3.4) var(Z|Pc^)7 = var(7^Z|Pc^)7. 

Therefore, any 7 € is an eigenvector of YaT[Z\PcZ) and thus, Ec is an eigenspace of this 
matrix. Denote by A* the eigenvalue associated to Ejr . Since the columns of Qc are vectors of 
Ejr , we have 

Yax{Z\P,Z)Q, = \IQ,, 

which implies that 

vax{Z\P,Z) = wai{QcZ\P,Z) = X^Q,, 

and ([H) =^ ([2]) is completed. 

The value of A* can be given by equation (|3.4p . Clearly, under the linearity condition we 
have for every unit vector 7 G E^, 

a: = var(7^Z|P,Z) = E[{-f^ ZflP^Z], 

and hence it suffices to take 7 = ^ Ylk=i"fk where (71, ...,7p_d) is an orthonormal basis of 
E^, to obtain 

a: = [\\Q,Z\\'\P,Z] . 

□ 

Remark 2. Here we compare CCV and DCV. Each existing method being based on close but 
sometimes different assumptions, it is difficult to build a complete sketch of the assumption sets 
used. Let us have a look to the interaction with the spherical assumption. First, Proposition 
\3.2\ informs us that coupling the CCV condition and the spherical assumption is equivalent to 
normality. But in our case, the sphericity implies DCV. Indeed, if Z is spherical, then its 
distribution is invariant by any orthogonal transformation, and we have for any measurable 
function f and for any orthogonal matrix H , 

E[ZZ'^f{PcZ)] =E[HZZ'^H'^f{PcHZ)]. 

In particular, the previous equation is true for any H which leaves invariant vectors of Ec and 
we obtain ([7]j of Lemma \3.S\ which is equivalent to DCV. Thus, we have just proved that the 
spherical assumption implies DCV. 

Theorem 13.41 is the analogue of Theorem 13.11 for TF2. 
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Theorem 3.4. Define the matrix = E[ZZ'^tp{Y)]. A ssume that Z satisfies Assumptions{l\ 
and [3 and has a finite second moment. Then, for every measurable function : M — )• M such 
that K[ZZ'^Tp{Y)] < oo, we have 

span(M^ - a;/) C E^, 

withXl, = ^^E[\\Q,Zfi;{Y)]. 

Proof. To make a complete proof, we need to show that ah the vectors in are eigenvectors 
of the symmetric matrix — X^I associated to the eigenvalue 0. The existence of the CS 
ensures that 

- a;/ = Emzz^\p,z] - x*jmY)], 

besides, thanks to the linearity condition and DCV, we have 

K[ZZ^\P,Z] = XIQ, + PcZZ'^P,. 
Thus, for any 7 E Ejr we have (M^ — = and the proof is completed. □ 

In practice, because A^ is unknown, it seems difficult to use Theorem 13. 41 Nevertheless, we do 
not really need to know this particular eigenvalue because a consequence of Theorem 13.41 is that 
E^ is an eigenspace of the matrix associated to the eigenvalue equal to A^. Therefore, if the 
dimension of E^ is large, then the spectrum of would have an accumulation of eigenvalues 
equal to A^. What we expect is that the other eigenvalues will be different from A^. If it is 
true, all the directions of Ec could be recovered and this eigenvalue problem is the topic of the 
next section. 

4. Covering the central subspace 

In this section, we find that a way to obtain an exhaustive characterization of the CS for TFl 
and TF2 is to consider many ip function. As usual, we begin with TFl and conclude by TF2. 

4.1. Order 1 test function. As a consequence of Theorem lS.H spaces generated by (E[Z^/;i], E[Z'0fc]) 
are included in Ec. Our goal is to obtain the converse inclusion. Because TFl is an extending of 
SIR, this one has a central place in the following argumentation. We start by giving a necessary 
and sufficient condition for covering the entire CS with SIR. Then under the same condition we 
extend SIR to a new class of methods. 

Assumption 4. For every nonzero vector rj G Ec, 'E[ri'^ Z\Y] has a nonzero variance. 

Equation (|3.3p provides a regression model for which a direction of Ec is almost surely or- 
thogonal to E[Z|y]. It is clear that this kind of situation is no longer allowed by the previous 
assumption. However, TF2 is designed to handle such pathological cases. 

Lemma 4.1. If Z satisfies Assumption[J\ and has a finite second moment, then Assumption^ 
implies that span{MsiR) = Ec and conversely. 

Proof. Under the linearity condition, span(M57/j) = Ec is equivalent to rj^ Msm > for every 
r] £ Ec, which is another formulation of Assumption □ 

We now extend Lemma 14.11 to TFl, the aim is to provide the same results replacing the 
conditional expectation E[Z|y] in Msir by some known family of functions. To state the 
following theorem, we introduce the function space Li {9 (y) fi{dy)) defined as 

Li {e{y)fi{dy)) = {u:R^R- [ \u{y)\e{yMdy) < +00}, 

JR 

where : M — t- M+ and /U a real measure. 
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Theorem 4.2. Assume that Z and Y satisfy Assumptions^ and^ Assume also that Z has a 
finite second moment. If ^ is a total countable family in the space Li(E[\\Z\\\Y = y]PY{dy)), 
then we can extract a finite subset of ^ such that 

Proof. Lemma 14.11 provides that {E[ZE[Zfc|y]], k = l,...,p} is a generator of Ec. First, let 
us show that any vector of this family can be approximated by E[Z(/)(y)], where (/> is a linear 
combination of functions in ^. Let e > and k G {l,...,p}. Since ^' is a total family in 
Li(E[||Z|||y = y]PY{dy)), there exists (j)k a finite linear combination of functions in ^ such that 

E[E[||Z|||y] \My)-nZk\Y]\]<e, 

besides, we have 

mZcl)k{Y)]-E[ZE[Zk\Y]]\\ = II E[E[Z|y] (0fc(y)-E[Zfc|y])]|| 

< E[E[||z|| |y] \My)-nZk\Y]\], 

and therefore, 

(4.1) \\E[ZMy)] - E[ZE[Zk\Y]]\\ < e. 
Here an important point is that E[Z(/>jfc(y)] G Ec, it implies that 

(4.2) span(E[Z(/)fe(y)], k = l,...,p) C span(M5/ij), 

Moreover, (j4.ip and the continuity of the determinant involve that the rank of the set of vectors 
E[Zi?i)fc(y)] is equal to d if e is small enough. Then, instead of an inclusion ()4.2p become an 
equality and we complete the proof by recalling that each (pk is a linear combination of a finite 
number of functions in ^. □ 



Th eorem l4.2l assumes that the family is total. Some mild conditions can be found in lCoudene 
Let us recall their main result. 



Theorem. (Y. Coudene) Let p G [0, oo[, a borelian probability measure on [0,1], and fn '■ 
[0, 1] — )• M a family of bounded measurable functions that separates the points : 

Vx,?/ G [0, 1], X / y, 3n G N such that fn{x) ^ fn{y). 

Then the algebra spanned by the functions fn and the constants is dense in Lp{[0, l],/i). 

Remark 3. Accordingly, we can apply Theorem \4-.S\ with any family of functions that separates 
the points, for example polynomials, complex exponentials or indicator functions. To make 
possible a simple use of this theorem we need to recall this result. If u = {ui, ...,uh) is a M.P 
vector family, then span{uu^) = span(?x). Thus, if we denote by ijji, ...,iIjh some elements of a 
family that separates the points, then the CS can be obtained by making an eigendecomposition 
of the order 1 test function matrix associated to the functions tpi, ...,tpH defined as 

H 

Mtfi = Y,E[Z^hiY)]E[Zi:h{Y)\^ . 

h=l 

Especially, the eigenvectors associated to a nonze ro eigenvalue of any order 1 test function 



tjspeciaiiy, me eigenvectors associatea to a nonze ro eigenvalue or any order i test junction 
matrix span the CS. Moreover, as pointed out in Cook and nI { '200 A ), for H large enough 



span(MsiR) = span{MsiR). A proof of this result can be obtained by Theorem \^.S\ By applying 
it with the indicator family of functions, it gives that 

span (E[Zl|ygj(^)}], /i = 1, H) = span(Afs'/R) = span(Ms'/R) = E^ 

if H is sufficiently large. Also, SIR can be understood as a particular TFl. Expression Hl.Ofl 
implies that 

~ ^ 1 

MsiR = —E[Zl{Y^i(^h)}]nZt{Yei{h)}V. 



H 
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hence, SIR is equivalent to TFl realized with the weighted family of indicator functions 

More generally for any family of functions, the space spanned by Mtfi is not change by a 
weighting with positive weight. Nevertheless it is no longer the case for the estimated space, and 
intuitively it seems that such a weighting could influence the convergence rate. The choice of the 
weights for the family of indicators is debated is section [5J\ thanks to a variance minimization. 

4.2. Order 2 test function. As described before, an important tool in this section is the 
eigendecomposition of the matrix M^, therefore we try to be more comprehensive in introducing 
the following notation. Let and Ay be the functions — ?• M respectively defined by 

A^(r?) = E[(r?^Z)V(n] and Ay(r?) = E[(7?^Z)2|y], 
and notice that if is a unit eigenvector of (resp. 'E[ZZ'^\Y]), then A^(ry) (resp. Ay(ry)) is 
equal to the eigenvalue of the matrix (resp. ¥.[ZZ'^\Y]) associated to r]. However, recalling 
that is an eigenspace of and 'K[ZZ'^\Y], the functions A^ and Ay are both constant on 
the centered spheres of E^ . Their respective values on the unit sphere of Ejr are noted A^ and 

Ay. 

Definition 1. Let ip be a measurable function. We call ip-space and note E^ the space 

E^ = span(M^ - A^) = span {rj G B{0, 1) C W, M^t] = X*^r])^ . 

Thanks to Theorem 13.41 we have already proved that under Assumption [1] and [3] any -i/^-space 
is included in Ec- However, nothing guarantees the existence of a tp-space equal to E^. We 
follow the same idea than for the order 1 method, i.e. we consider some transformations of Y 
belonging to a dense family. Nevertheless, the results are a little different because we provide 
the existence of a V'-space equal to Ec. A unique additional assumption is needed. 

Assumption 5. 

Vr? G Ec, WvW = 1 F (^E [{f]^ Zf\Y] = E ^^y—j- Y ^ < 1. 

Remark 4. Assumption takes the same approach as Li and Wane i 200ii) . As it is high- 
lighted in Remark\^ our set of assumptions is weaker than their beacause DCV has replaced 
CCV. To match their context, assume that CCV condition is satisfied. Then clearly. Assump- 
tion \^ becomes "E[(?7^Z)^|y] is nondegenerate", i.e. is not a.s. a constant. Otherwise, TFl 
allows an exhaustive estimation of the CS provided that ¥.[{1]'^ Z)\Y] is nondegenerate. Thus the 
exhaustiveness condition of TF is the union of the two previous and it gives 

¥,[{r]^ Z)^\Y] or KK'ij'" Z)\Y] is nondegenerate, 

which is the same than the one provided for DR in \Li and Wand 1(2007 ). Accordingly, TF evolved 
in a more general context given by DCV but the assumptions ensuring its exhaustiveness are as 
weak as the one in the literature. 

In the proof of the following theorem we will need Lemma [T] and Proposition [2] which are 
stated and demonstrated in the appendix. 

Theorem 4.3. Assume that Z and Y satisfy Assumptions [Jl andO Assume also that Z 
has a finite second moment, then if is a total countable family in the space Li(E[||Z|p \Y = 
y]PY{dy)), there exists tp a finite linear combination of functions in ^' such that 

E^ = Ec. 

Proof. Let ^' be a total countable family in Li(E[||Z|p \Y = y]PY{dy)). By Theorem 13.41 
E^ C -E^ for any Tp. Then it suffices to show that there exists tp a finite linear combination of 
functions in ^ such that dim(£'^) = rank(M0 — A^/) = d. In the basis (Pi, P2), where Pi and 
P2 are respectively basis of Ec and Ec , the matrix — A^J can be written as 

N^, 
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with N^, = Pl{M^ - Kp)Pi- Notice that the space 

h 

is a hnear subspace of the symmetric matrices with dimension d x d. In the basis {Pi,P2), 
Assumption ([5|) becomes 

Vr/ G W^, P(r/^iVyr? = 0) < 1, 
with iVy = Pi {My - A^)Pi. Clearly, this imphes that 

(4.3) V??GM^ ifN^ri^Q, 

and because ^ is a total family in Li(E[||Z|p \Y = y]PY{dy)), the function ip in the previous 
equation could be a finite linear combination of functions in ^ and then G Ai. Thus the 
proof consists in showing that given a linear subspace Ai C M'^^'^ of symmetric matrices, if (j4.3p 
is checked, then there exists an invertible matrix in M. The contrapositive is the statement of 
Proposition [21 D 

Theorem 14.31 states the existence of a ■j/'-space equal to Ec, yet it does not provide an explicit 
form of such a ip. Hence, we set out the following corollary. 

Corollary 4.4. Assume that Z and Y satisfy Assumptions andO Assume also that Z 
has a finite second moment then, if is a total countable family in the space Li(E[||Z|p \Y = 
y]PY{dy)), we have 

® = Ec, 

where is a finite subset of^. 

Proof. From Theorem 14.31 we have E^ = Ec where V' = ^h=i oih^^h- Hence, we need to show 
that E^ C ®E^^ since the other inclusion is trivial. Suppose that it exists r] G E^ with norm 1 
such that T] _L ®E^^. Then by definition, for every h = 1, 

and we obtain 

H 

h=l 

which is impossible because 77 G E^. □ 

Corollary 14.41 is the counterpart of Theorem 14.21 for TF2. Nevertheless, it seems difficult 
to use it in practice because it requires an eigendecomposition of a large number of matrices. 
Besides, Theorem 14.31 is the cornerstone of TF2. Using the theorem quoted in Remark [3l we 
provide order 2 methods based on families of functions that separate the points. For each such 
family, it exists a function ifj such that the associated V'-space is equal to the CS. 



5. Choice of the test function. Asymptotic variance minimization 

This section is divided into two paragraphs. First, we study the case of the family of indicator 
functions for TFl and secondly, we are interested in finding the best ip for TF2. Clearly, for 
the order 1 method we need at least d functions to recover the CS whereas for the order 2, as 
we showed before, we can expect to find a function such that covers all the directions of 
the CS. This is the reason why we fix the class of function in the first paragraph and we search 
a unique function in the second paragraph. 
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5.1. Order 1 test function: optimality among the indicators. In this section, we develop 
a test function plug-in method based on the minimization of the variance estimation in the case 
of the family of indicator functions for '^h- Theorem 14.21 and Remark [3] imply that the whole 
subspace Ec can be covered by the family of vectors {E[Zl|yg/(/j)}.], h = 1, ...,H} for a suitable 
partition I{h). Actually, it is possible to extract d orthogonal vectors living in the space spanned 
by this family, and then it provides us a basis of the CS. This procedure is realized by SIR. 
Nevertheless, the issue here is somewhat more complicated, we want to find d orthogonal vectors 
that have the minimal asymptotic mean squared error for the estimation of the projection Pc- 
We define 

|2" 



(5.1) MSE = E 



where || • || stands for the Frobenius norm and P„ is derived from the family of vector rj = 
(jji, ...,rj(i) defined as 

1 " 

% = - X] ^i^kiYi) with IpkiY) = (l|ygj(i)|, l|ygj(^)})afc = lyQfc, 
i=l 

where G M^. Besides, we introduce r] = (?/i, j/^) with rjk = KlZipkiY)]- Consequently, 
we aim at minimizing MSE according to the family {ipk)i<k<d, or equivalently according to the 
matrix a = (ai, ...,arf) € M^^"^. Moreover, since we have 

MSE = E[tr(P-P„)2] 

= d + E[d-2tr((/-Qe)Pn)] 
(5.2) = E[d-d] + 2E[tT{QcPn)], 

and we suppose that d is known, the minimization of MSE results only on the minimization of 
the second term in the previous equality. Hence, this naturally leads us to the minimization 
problem 

min lim nK[ti{QcPn)], 

a n—>-oo 

under the constraint of orthogonality of the family {r]k)i<k<d- For a more comprehensive ap- 
proach, we choose to minimize the expectation of the limit in distribution, instead of the limit 
of the expectation when n goes to infinity, of the sequence nii{QcPn)- To set out clearly the 
next proposition, let us introduce some notations. Define the matrices 

c = (Ci,...,Ch) with Ch = nz^{Y^m}l 

D = diag4 with dh = [n\\QcZft{Y&m}]) , 

and 

G = D-^C^CD"^. 

The matrix G is the Gram matrix of the vector family {Ch/ Vdh)i<h<Hj Theorem 14.21 and 
Remark [3] ensure that its rank is equal to d. Besides, G is diagonalisable and so we define 
P = (Pi Pa) G MPx(«'+(P-'^)) such that 



P'^GP 



Do 




where Dq £ W^""^. 



Proposition 5.1. The random variable nti{QcPn) has a limit in law Wa as n ^ oo. The 

minimization problem 

(5.3) minE[VFQ] u.c. rfr] = Id, 

a 

has a unique solution, up to orthogonal transformations, given by 



a = D~2P^Dq 
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Proof. We first calculate the expectation of the limit in law of the sequence ntr{QcPn) and then 
we solve the optimization problem. Since 

?itr(Q,P„) = ntr(^^Q,^(^^^)"i) 

= tr{^/n{rf - r]'^)Qc\/n{r]- r]){rfr]y^), 

Slutsky's theorem and the continuity of the operator tr(-) provides that ntr{QcPn) converges 
to tr(5^(5c'^) in distribution, where S G MP^"^ is the limit in law of the sequence y/n{rj — 77), i.e. 
a normal vector with mean 0. Thus it remains to calculate the expectation of this limit, notice 
that 

d 

E = E [tr(5^Qe<5)] = tr (Q,E[5fc5j]) , 

fc=i 

where 5k stands for the limit in law of the sequence \/n{rjk — r}k)- Finally, since its variance is 
equal to var(Z'i/^fc(y)) and using the linearity condition, we have 

d 

(5.4) E[W„] = 5^E[||Q,ZfV;fc(y)2] . 

k=l 

Now let us reformulate the minimization problem in terms of matrix a. First, from (j5.4|) and 
using that the I{h) are pairwise disjoint, we have 

d 

(5.5) E [Wa] = otlE[\\Q,Z\\HYt'^]ak = tr{a^Da), 

k=l 

and also, 

(5.6) r/'^r? = a'^C'^Ca = {D^afCD^a. 
From ()5.5p and ()5.6p we set out the equivalent minimization problem 

min tr (a^Da) u.c. (D^a)^GD^a = Id, 

a 

then, from the variable change U = P^D^a we derive 

min tr{U^U) u.c. o) ^ = 

By writing U'^ = {Uf, U2) we notice that there is no constraint on U2, which implies that 
U2 = 0- Consequently, it remains to solve 

(5.7) min tr(C/iC/f ) u.c. UiU^ = Dq, 

where Ui G M'^^'^. Clearly, in ()5.7p the quantity to minimize is fixed by the constraint. Then, a 

_ 1 

solution of it is given by Ui = Dq where H is any orthogonal matrix. Hence, the solution 
of diSl) is 

a = D-^PU = D-^PiD~^H, 
where H is any orthogonal matrix. □ 

To make a link with other methods and facilitate the programming of TFl, let us explain 
the solution in another way. Instead of explaining the solution in terms of weight we put on 
the indicator functions, we explain it in terms of vectors rjk associated to these weights. First 
notice that, with the chosen notation 

D-ic^CD-^Pi = PiDo, 

_i -1 
multiplying by C-D 2 on the left and by Dq ^ on the right, it gives 

CD^^C^CD^^PiD^^ = CD^-2PiDq^Dq. 
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Defining an order 1 test function matrix Mtfi = CD C , and noting tliat rj = CD 2 PiDq ^ , 
the previous equation is equivalent to 

MtfiV = rjDo- 

Thus, since Mtfi has the same rank as G, we have showed that the vectors rjk derived from the 
optimal weight family, are the eigenvectors of Mtfi associated to nonzero eigenvalues. Besides, 
it is easy to verify that the previous development is still true when each quantity is replaced by 
its estimate. Therefore in practice, we have to make the eigendecomposition of an estimator of 
the matrix Mtfi- 

As it is stated in the introduction of section 13.11 the SIR estimator is obtained thanks to an 
eigendecomposition of the matrix Msir, while our matrix of interest here is Mtfi- To compare 
both methods, we write there expressions as follows 

(5.8) Msm = Y: Mtfi = E 

As we noticed before, SIR is really closed to the order one test function method proposed 
here, both methods try to obtain the information contains in the slices through the Ch. This 
information is collected more rapidly thanks to TFl because it minimizes the criterion ()5.ip . 
and as a consequence the convergence rate would be better. This idea is supported by the 
expression of Mtfi in which bad slices are less weighted. When H — )• 00, Msir — )• Msir and 
clearly Mtfi converge to 

Mtfi = E Z LJ^ 

MTFI ^[^]E[||g^^||2|y]_ 

As a consequence of (jS.Sp . the TFl variance minimization with indicators requires the knowledge 
of Qc- Therefore we set out a plug- in method to estimate Qc- 
TFl Algorithm: 

(0) Standardization of X into Z. Initialize Qc = I- 

(1) Compute 

^ n 1 

dh = - ^\\QcZifl{Yiei{h)}, Ch = -YZil{Y,(,j^h)} 
" i=i " i=i 

and M = f2^^- 

h=i dh 

(2) Extract rj = (r/i, r/^): the d eigenvectors of M with largest eigenvalues. 

(3) Qc = I- wF- 

Steps 1 to 3 are repeated until convergence is achieved and then rj is the estimated basis of the 
standardized CS derived from TFl. The estimated directions of the CS are Ti~2rj. At the end 
of the paper, this method is tested and compared to SIR using simulations. 



5.2. Order 2 test function: Optimality among the measurable functions. Here we 
have a different approach than for TFl, we aim at finding the optimal such that the variance 
error is minimal. Recall that = E[ZZ"^'(/'(y)], we have already proved that the eigenvectors 
of this matrix can be decomposed into two blocks : the one associated to the eigenvalue 

and the other which necessarily belongs to Ec- Therefore, Pn is derived from the eigenvectors 
associated to the eigenvalues different from A^, , and so we decided to express P„ in the following 
way. Theorem 14.31 guarantees the existence of ij) such that = Ec- Based on this result, 
suppose that we are able to differentiate each eigenvalue associated to an eigenvector in Ec 
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from A^. Then we can find a contour C which encloses the eigenvalues different from A^, and 
finally we can write Pc and its estimator P„ as 

Pc= f {Iz - M^y^dz and Pn = f {Iz - M^)"Mz, 
Jc Jc 

where = ^ Y17=i ^i^Ti^O^i)- we did for TFl, we aim at minimizing the MSB through 
the quantity E[tr((5c-fn)] (see equation ()5.2p ). We first calculate the limit in law of the ran- 
dom variable nti^QcPn), as n goes to infinity and then we derive its expectation. The next 
proposition is dedicated to this calculus. 

Proposition 5.2. Let be the limit in law of the random variable ntr{QcPn), then 
E[W^] = tr (E [ZZ^\\Q,Zftl;{Y)^] Pc{PcM^ - IKpY^) • 

Proof. We have 

QcPn — Qc{Pn Pc) 

= qA{Iz-M^)~^ -{Iz-M^)-^dz 



= Qef^ilz - M^)-\M^ - M^){Iz - M^y'dz, 

and then, we derive 

(5.9) QcPn = Qc£{Iz - M^)-\M^ - M^){Iz - M^)-^dz 

+ Qc £{Iz - M^rHM^ - M^){Iz - M^)-\M^ - M^){Iz - M^Y^dz. 

Consider the trace of the first term of equation (j5.9p . since Qc and {Iz — M^)~^ commute we 
have 



tr (Qc^ilz - M^)-\M^ - M^){Iz - M^y'dz 



tr ( (M^ - M^) j) Qc{Iz - M^y^z 



Besides, it is clear that 

Qc 



(5.10) Q,{Iz-M^,y'= , 
and recalling that A^ is outside C, we have §^ f^^_^,yi dz = and then (|5.9p implies that 

tr (QcPu) = tr (Qc^ilz - M^y\M^ - M^){Iz - M^y^ 

{M^-M^){Iz-M^y^d^. 

Denote by A the limit in law of y/n{M^ — M^), since M goes to M in probability, Slutsky's 
Theorem implies the convergence ntr {QcPti^ with 

= tr (^Qc ^"(12 - M^y^A{Iz - M^y^A{Iz - M^yUz 
Here we use equation ()5.10p to derive 

(5.11) = tr Iq,A £ ^l^-^^^dzAQA , 
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and the integral inside ()5.1ip can be calculated the following way. Splitting it into two terms 
and using ()5.10p . we obtain 



{Iz - M^)-i 



dz 



P,{Iz-M^)-\ , rQ,{Iz-M^) 
-dz + 



C V- ■"^(, 

Pc{Iz - M^)-i 

c 



-dz 



dz + Qc 



dz, 



the last term in the previous equation is clearly equal to 0. Concerning the first term, since for 
all k = 1, ...,d, we have 



{Iz - M^)-^ 



dzr]k 



dz 



(a^(%)-a;)2 

= P,{P,M^-IXl)-^r]k, 
and since all the vectors in belong to the kernel of this matrix, we get 

Jc \z - \) 

Injecting it in (jS.lip . this leads us to 

= tr (AQ,AP,(P,M^ - /A;)-^) , 

and it remains to calculate its expectation. The linearity condition implies that QcM^Pc = 0, 
and we have 



E[AQcAP, 



lim nE 



(M^ - M^)QcM^P, 
M^QcM^Pc 



lim nE 
nZZ^Pc\\QcZf^{Y)\ 



which complete the proof of the proposition. 



□ 



Proposition 15.21 provides us the expression of the quantity to minimize with respect to the 
function ip. The next lines are attached to find ijj such that E[W^,] is minimal. This informal 
calculation leads us to a fixed point equation whose solution is expected to be the minimum of 
E[W^]. Thanks to proposition 15.21 the quantity to minimize can be written as 

E[W^] = ix{¥.[ZZ'^P,\\Q,Zf^{Yf]{P,M^ - /A;)-^), 
or with the notations A = ZZ'^P^WQcZf and B = P^ZZ"^ - "'^jl^ /, 

E[W^] = tr {^\A^{jf\ E[SV'(y)]~^) . 
Thus we are looking for -0 such that for every bounded measurable function 5, 

d . 



dt 



0, 



or equivalently. 



E 



2tr {A5'il)¥.[B'il)Y'^) 



- tr (E[^V^]E[Si/']-^{B(5E[SV']"^ + E[S^]-^S(5}E[S^]-^) = 0, 
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where 6 and ■0 stand for 6{Y) and ip{Y). Define the functions A(Y) = K[A\Y] and B(Y) = 
E[i?|y]. Since the previous equation should be true for any y-measurable random variable 
6{Y), we derive 

2tr {A{Y)ij{Y)E[B'ilj]-~'^) 

- tr {E[A^p'^]E[Bi|J]-'^{B{Y)E[Bi|J]-'^ + E[Bip]~'^ B{Y)}E[Bip]~'^) = a.s., 
which leads to the implicit equation 

tv{E[Bi;]-^E[A^/?]E[Bi,]-^{E[B^r^B{y) + B{y)E[B,jJr^}) 

^'•''^ ^^'^ = 2triAiy)Em^^) • 

This solution describes the optimal V function to perform TF2. To find this we propose 
an iteration of the point fixed equation (|5.12|) . Before we state a more accurate algorithm to 
compute TF2, we set out a new way to express (j5.12p . As we highlighted at the beginning of 
the section, based on Theorem 14.31 we suppose that is such that we can reach the eigenvectors 
V^Ij = (?7ii •••)%) S M.P^'^ of that are in E^. Therefore we can write Pc = t^ipfi^ and by 
definition of r/^, we have 

(5.13) E[Bij{Y)]-\^ = ij^D^, 

where D^p = diagi^{X^{r]k) — A^)~^. Besides, a simple use of the linearity condition provides 
that E[r]'^ZZ^\Y] = E[rf Z Z'^ Pc\Y] for every r] ^ E^. Consequently, we derive that 

(5.14) iqlB{y) = vlB{y)P,. 

Then with the introduced notations and using ()5.13p and (|5.14p , we obtain this other formulation 
of (I5T2]) . 

tr (D^A^,D^,{D^B{y) + B{y)D^}' 

m = ^— 



2ir [A{y)D^ 



where 



A^ = E [r]pZ^rj4Q,ZfiPiYf] , A{Y) = n^,A{y)ri^, B{y) = Vi,B{y)rj^, 

are d x d matrices. Using the symmetry of the matrices A^ and B{y), and some well-known 
properties of the trace, we obtain 



ti [D^A^D^B{y)D^ 
(5.15) V'(y) = - 



tr [Aiy)Dl^ 

Since A and B are unknown function, we use a slicing approximation and it gives 



tr I D^A^D^BhD^ 
h tr (A^Dl) 

where Ah = E[A{Y)\{y^n^h)}\ and Bh = E[5(y)l|yg/(fc)}]. Now we set out the TF2 method 
based on the family of indicator functions. In practice, the fixed point equation ()5.15p gives 
better results than ()5.12p . therefore we use ()5.16p to compute TF2. We propose the following 
algorithm that describes the iteration needed to implement our method. To be more comprehen- 
sive, we based the algorithm on the weights ah instead of the function iph{y) = J2h ^h'^{yeiih)}- 
Besides A^ and are noted A and D, and we will need 



Mh = E[ZZ''l{y^j(^h)}] and Xh=E 



WQcZ 



|2 



Because Xh is the eigenvalue associated to the space E^, we estimate it the following way, 
supposing that dim(ii^c) < dim(£'^). 
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TF2 Algorithm: 

(0) Each I{h) contains jj observations. Compute 



Mh 



1 " ^ ^ 

- ^ ZiZfl{Y,^ji^h)}, Aft = median(A € spectrum(M/j)), 



n . 



and initialize S/^ ~ ZY[0, 1] for every h = 1, H . 

(1) Identify the eigenvectors rj = (jji, rj^) G £^c of M = UhM^. 

(2) Derive 5 = diagfc(AT(%) -\*-)~^,Qc = I-wF and 



^ 1 

^ = XI ^hrfAhT], with = - X Z^Zf ||(5c-^i||^l{y>6/{/i)}- 
(3) Compute 



n 

h i=l 



ti (3^ AD [rfMhTi-Xhl) 
tr {d"^ rp" AhT] 



oth 

Repeat the last three steps until the convergence is achieved. The resulting function ip is an 
estimate of the solution of the fixed point equation. Finaly the set of vectors rj form an estimated 
basis of the standardized CS. The space generated by Ti~^rj provides an estimation of the CS 
by TF2. 

Remark 5. A crucial point need to deserve our attention. It concerns the way we identify the 
eigenvectors of that belong to Ec and a fortiori their associated eigenvalues. It intervenes 
at each iteration of our algorithm to estimate D^, and r/^. The theoretical background of the 
TF2 method advocates for an identification process based on the eigenvalues more than the 
eigenvectors. Indeed, as it is pointed out at the end of section\^the eigenvalues of associated 
to eigenvectors of are all equal. We tried to base an algorithm on this fact but it appeared 
that it was not robust to small samples. So that we choose to develop another one which takes 
into account the nature of the eigenvectors of M^. Let rj be an eigenvector of M^, we based a 
new identification process on the dependence between {rj"^ Z) and Y. We propose to compare the 
Pearson's chi-square statistic of the test of independence between {rf^ Z) and Y . Therefore, for 
each eigenvector we divide the range of {rf" Z) into H slices noted J{h) and we calculate 

^ {phh' - Phh/^ PJm^' 
(5.17) S{v) ^ 



Phh''' Phh' 



where ph^h' = n SiLi -''-{i'i6/{/i)}l{(7?^Zi)6J(/i')} ' ^^^'^ ^ eigenvectors of associated to 
the largest values of S are identified as being in E^. As a consequence, at step\^ of the TF2 
Algorithm, the A^(r//;) 's are the eigenvalues of M associated to the eigenvectors fff^ 's with the d 
largest values of S, is the median over the other eigenvalues. In the next section dedicated 



to simulations, criterion (5.11) has been used to compute TF2. 

6. Simulations 

In this section, we first compare the performance of the order 1 test function variance mini- 
mization with the performance of the SIR estimator. Then, we compare some order 2 methods 
through pathological models for order 1 methods (see example I3.3p . To measure the perfor- 
mance of a method we evaluate the error between the CS and its estimate with the following 
distance: for two subspace Ei and E2, if Pi and P2 are their respective orthogonal projection, 
the distance between Ei and E2 is 

(6.1) Dist(Si,^2) = IIA -i^2f , 

where || • || stands for the Probenius norm. 
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Model I 



Model II 




Figure 1. Comparison of TFl and SIR when X has a spherical distribution. 

Besides, since TFl and TF2 are performed with the family of indicator functions, we have to 
discretize the response into H slices. The slices are built in such a way that each slice contains 
the same number of observations. 

6.1. Order 1 test function. Let us consider the case where the predictors have a gaussian dis- 
tribution. Clearly PcZ and QcZ are two independent random vectors and then E[||Qc-2^|Pl^] = 
E[E[||Qc2'|p |Pc^]|y] =p-d. Therefore span(MTFi) = span{MsiR) and TFl provides exactly 
the same estimator as SIR. Simulations made in this case highlight the similarity between both 
methods and are not presented here. 

Consequently, to point out the differences between these two methods, we generate non- 
gaussian predictors. Taking X = pU where U is a uniformly distributed vector on the unit 
sphere of independent of p, which is a real random variable. A first point is that X has a 
spherical distribution. Moreover, we take 

(6.2) p = e |10 + 0.05W^i| + (1 - e) |30 + O.O5W2I, 

with Wi ~ AA(0, 1), W2 ~ AA(0, 1) and e We performed SIR and TFl on the following 

two models. Model I is derived from Li ( 199ll ) and considered in many articles on the subject. 



Model II: Y = sign(X2)|Xi/2 + 5| + 0.5e, 

where e ~ M{0, 1). We have to standardize X into Z to compute TFl. Clearly, the variance 
of X is proportional to the identity matrix, then the standardized directions are the same than 
the non-standardized one. For models I and II, directions to estimate are (1,0, ...,0)^ and 
(0,1,0,...,0)^. 

To be more comprehensive, for each model we compute both methods with some differ- 
ent configuration of the parameters {n,p,H) which are taken as (100,6,5), (500,10,10) and 
(1000,20,20). For each configuration, we perform 100 simulated random samples. Some box- 
plots of the distances measured between the estimated and the true CS are presented in figure 

m 

For each model and in all the parameters configurations, TFl performs better than SIR. 
Model II refiects a suitable situation for order 1 methods because its regression function is not 
symmetric with respect to any of its coordinates. As a consequence the measured errors are 
quite small for both methods. Model I indicates a more difficult situation. Indeed, because the 
standard error of X2 is near 16 ^ 1.5, the regression function associated to model I is almost 
symmetric with respect to its second coordinate (see model [33]). It appears that both methods 
have difficulties in finding this coordinate. Figure [1] shows that in each situation the difference 

20 



Model III 



□ TF1 



(500,10.101 
(^P.H) 



Figure 2. Comparison of TFl and SIR when there is nonUnearity between the predictors. 

between the performance of both methods increases with the sample size. Nevertheless, because 
of the high level of similarity between the theoretical background of these two methods, the 
distances presented are really close. Especially for n = 100, where the improvement of TFl is 
not really significant. 

To reach a point of view developed in the simulation study of Cook and Ni ( 20051 ). we are 



interested in the link between the variation of var(Z|y) and the performance of the presented 
method. First, according to equation ()5.8p . the variation of the random variable E[||Qc'2^|P|i^] is 
essential in studying the differences between SIR and TFl. Indeed if this one is a constant, then 
dh = ¥,[\\QcZ\\'^l^Y&i{h)}] = {p ~ d)ph and TFl is the same method than SIR. Consequently, 
SIR estimates near optimal with respect to criterion (j6.ip when the variations in E[||Qc-2^|Pl^] 
are near 0. Besides, if this random variable is nonconstant then also the dh and the differences 
between both methods are highlighted. Secondly, we can notice that E[||Qc-^|Pl^] and var(Z|y) 
are strongly linked. Thanks to the well-known variance decomposition formula, we have 

var(Z|y) = E[vav{Z\PcZ)\Y] + var{E[Z\PcZ]\Y), 

and using the linearity condition, we obtain that 

tr(var(Z|y)) = E[\\QcZf\Y] + tT{vav{PcZ\Y)). 

Thus, as it was the case to distinguish IRE from SIR, it seems that the variations of var(Z|y) 

has an important role to differentiate TFl from t he SIR. 

As it has been studied in some recent papers like lLi and Pond (|2009l ^ and lPong and Lil (|2010l ^. 



we introduce nonlinearity in the distribution of the predictors. Although it does not correspond 
to the set of assumptions required in SIR and TFl theoretical background, it is interesting to 
provide the following results as an indicator of the robustness of each method. Here, predictors 
are generated as previously but we change Xi and X2 as follows, 

Xi = 0.2X3 + 0.2(^4 + Wf + 0.2u, 

X2 = 0.1 + 0.1(^3 +X4) + 0.3X| +0.2u, 

where u ~ AA(0, 1). Model III is the same than model I but with the above predictors distribu- 
tion. We provide boxplots of the estimation error of the 100 simulated random sample in figure 

m 

In this case, figure [2] shows a large difference between the estimation error of SIR and TFl. 
TFl performed better in each case and the difference between both methods increases as n is 
large. 

6.2. Order 2 test function. Symmetric model. We now compare several well-known order 
2 dimension reduction methods with TF2. Order 2 methods we have computed include SAVE, 
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Figure 3. Comparison of TF2, SAVE and DR when X has a gaussian distribution. 



pHd, SIR-II and DR. For the models we consider here, pHd and SIR-II do not work as well as 
the others. Therefore we focus on a comparison between SAVE, DR and TF2. 

TF2 estimation is not as close to DR and SAVE than the order TFl estimation is closed to 
SIR. The following simulations highlight this fact and as a consequence we begin this section 
by providing the results obtained with gaussian pred ictors. We co n sider e r the following three 
regression models, note that model V is derived from Li and Wang ( 200?! ) . 

'I^il' 



Model IV: 



Y 



4tanh 



2 



+ 0.5e 



Model V: Y = {)AXl + ^/\X^\ + {).2e 

Model VI: Y = l.SXiXs e 

with £ ~ J\f{Q, 1) and X ~ Ar(0, Ip). The CS of model IV is spanned by the direction (1, 0, 0), 
whereas in Model V and VI, it is a two dimensional subspace generated by (1,0, ...,0) and 
(0, 1,0, ...,0). As the simulations for the order 1, we consider different parameter configurations 
where each of the presented method is in a convenient situation. We simulate SAVE, DR and 
TF2 with (re,p, H) equal to (100, 6, 5), (500, 10, 5) and (1000, 20, 10). For each configuration, 100 
simulated random samples have been performed and the resulting boxplots with their averages 
are presented in figure [3l 

For all the selected models, TF2 perform better than DR and SAVE. The most significant 
improvement happens for model IV in which our method perform better than the others around 
90% of the time in each {n,p,H) configuration. Note that for n = 100, 500, the mean of the 
TF2 is two times smaller than the mean of DR or SAVE. For n = 1000 this factor goes to three. 
The results of the simulation for model VI are really close from model IV. Model V is a more 
complicated one for each method. Moreover, we have to wait n = 1000 to remark substantial 
differences in the distribution of the criterion. In every model, the criterion mean of TF2 is the 
smallest and as n is large, as the improvement of TF2 looks substantial. Besides, it is clear that 
for the selected models, SAVE and DR perform in a similar way. 

Remark 6. For our study and the development of TF2, model V was a really interesting one. 
In figurel^ for n = 100 the mean is less than the median, and it is no longer the case for n larger 
than 100. This marked change in the boxplots is explained by the presence of small outliers in the 
first situation and large outliers in the second one. Indeed as n is large, TF2 performs better 
but however, the mean is shifted by the presence of outliers that reflects uncommon difficult 
situations. As it is explain in section \5.S\ TF2 relies on the way to identify eigenvectors of 
that belong to Ec. To make that possible, a test of independence between the response and 
the projected predictors is conducted. Outliers of model V for n equal to 500 and 1000 are the 
consequence of a bad eigenvector choice realized by this test. When n is sufficiently large this 
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Figure 4. Comparison of TF2, SAVE and DR when X has a spherical distribution. 



no longer occurs. When the TF2 algorithm is iterated a larger number of times, it happens only 
very few times. 

To conclude this simulation section we present the results obtained with spherical predictors. 
Here, X is generated with the equation X = pU where U is a uniformly distributed vector on 
the unit sphere of M^, independent of p defined by equation ()6.2p . Again we study the model 
IV and also the following ones, 

Model VII: Y = \Xi\ + (^^^ + 0.5e 

Model VIb: Y = X1X2 e 

where e ~ A/'(0, 1). Model VI has been changed to reduce the signal to noise ratio. The directions 
to estimate, the parameter configuration and the number of simulated random sample are the 
same than in the Gaussian case studied previously. Boxplots and their associated averages are 
presented in figure HI 

Model I still reflects the most important improvement of TF2 with respect to SAVE and DR. 
When n is large, it performs around height times better than the other. In model VIb, TF2 
estimation deteriorates by changing distribution of the predictors from gaussian into spherical. 
Finally, model VII provides a standard new situation where the improvement of TF2 is highly 
significant. 

7. Concluding remarks 

This article introduces the basis of a new methodology about SDR. Although the theoretical 
background of TFl and TF2 is quite the same than SDR methods, the methods proposed work 
under weaker conditions than the ones of the literature. Moreover, the resulting estimation 
methods are not at all the same. Indeed, the introduction of some transformation of the 
response was the original idea of this work and has led us to some new way of investigation in 
SDR. A surprising point was the similarity between SIR and the TFl variance minimization. 
For TF2, the simulation study underlines its high accuracy over other order 2 methods and 
legitimates the use of TF. However, the framework develop here is not yet completed. 

First, the estimation of the dimension of the CS has been avoided in the present work. 
Prospects can be find in the Pearson's chi-square statistic used to select a basis of the CS: this 
statistic could also be employed to estimate the dimension of the CS. Simulations about such a 
dimension estimation method provided until now some good results. Moreover, an idea which 
is still under development, is to incorporate such a test in the TF2 algorithm. 

Secondly, TF offers a lot of different methods deriving from the choice of a family of functions 
that separates the points (see Remark [3] and Corollarv l4.4p . Here we attached to study TF with 
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indicators. The Fourier basis or a polynomial family could also be considered to derive some 
new methods. Besides for TFl and TF2, a smooth kernel estimation of the function ^ may lead 
to better convergence rates. 

Finally, we have some few words about a set of methods called hybrid. Some regression 
function has different kind of components. Consequently, in many cases a particular method 
would provide a good estimate of some components but another one would be needed to infer 
about the remaining components. This clearly argues for methods that are a mixing of the 
existing ones. This kind of methods are usually called hybrid method, they can be summarized 
by the equation 

M = aMi + (1 - a)M2, 

where Mi and M2 are the associated matrix of two different methods. A spectral decomposition 
of M gives an hybrid e stima tion of the CS. Thi s kind of consideration were recommended by 
Gannoun and Saracc and lYe and Weis^ (j2003 l) proposed a bootstrap method to select 



the parameter a. This includes the combination of SIR and SAVE, SIR and pHd, and SIR and 
SIR-II. Besides, it is commonly known that 

MsAVE = Mljji + MsiR-ii, 

and that 

Mdr = E[E[{ZZ^\Y] - If] + Mliji + tT{MsiR)MsiR, 

making SAVE and DR some combinations of SIR and order 2 moments based methods. There- 
fore SAVE and DR do not only involve order 2 moments of the predictors given the response. 
Thus it seems more realistic to develop hybrid methods based on TFl and TF2 matrices and 
specifically, a choice of the parameter a could be realized by the optimization of a well chosen 
criterion as it has been done independently in TFl and TF2. Work along this line is in progress. 

APPENDIX 

The follow ing lemrna is a simplified version of a result about subspaces of non-invertible 
matrices (see Draisma ( 20061 ) . proposition 3). 



Lemma .1. Let M, N £ R'^^'^ and ao > 0. If\la<aQ, rank(iV + aM) < rank(iV), then 

Mkev{N) C lm{N). 

Proof. Denote by Pa the characteristic polynomial of + aM and define Tq, = rank(A^ + aM) 
and ka = dim(ker(A^ + aM)) = d — r^- Because of the continuity of the determinant, the 
coefficients of Pa converge to the coefficients of Pq, then Pa converges uniformly to Pq on every 
compact. By definition of /cq, Pq is such that 

Po{x) = x'=»Qo(x) with Qo(0) / 0. 

Now we use the uniform convergence. First for a small enough we have ^^'"''^(0) / 0, and this 
gives the upper bound ka < kQ. Using the assumption we obtain k^ = ka- Therefore, again 
from the uniform convergence, for some ao, 

Qa(0)/0, a<ao. 

Clearly, there exists a contour C such that none of the nonzero eigenvalues of + aM belong 
to C, Q < ao. Using the residue Theorem, this allows us to recover the respective projections 
IXq and XIq on the kernel of the matrices A^ and A^ + aM in the following way 

Ho = / (A^ - zl)-^dz, and n„ = / (A^ + aM - zl)-^dz, 
Jc Jc 

Uo-Ua = a^{N- zI)-^M{N + aM - zl)-^dz. 



and we can see that 
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Because as a goes to 0, none of the eigenvalues of N and N + aM crosses C, the integral 
converges and then we derive that IIq, —t- Ho as a —t- 0. Besides, we have 

(iV + aM)na = 0, and iVHo = 0, 

which lead us to A^(no — IIq) = qMIIq, and we obtain 

Im(MP„) C Im(iV). 

Using the continuity of 11^, we conclude the proof. □ 

Proposition .2. Let A4 C M.'^^'^ be a linear subspace of noninvertible symmetric matrices. 
Then 

3u£R'^, VMeM, u^Mu = 0. 

Proof. Since is a linear subspace, we can apply Lemma [TT] with a matrix of maximal rank 
in A4 and any M G A4. This gives, for every M and every u E ker(A^), 

Mu = Ny, 

with y G W^. Because N is symmetric, by multiplying the left-hand side by u^, we obtain 
u^Mu = 0. □ 
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